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We introduce a new kind of percolation on finite graphs called jig- 
saw percolation. This model attempts to capture networks of people who 
innovate by merging ideas and who solve problems by piecing together 
solutions. Each person in a social network has a unique piece of a jig- 
saw puzzle. Acquainted people with compatible puzzle pieces merge their 
puzzle pieces. More generally, groups of people with merged puzzle pieces 
merge if the groups know one another and have a pair of compatible puz- 
zle pieces. The social network solves the puzzle if it eventually merges all 
the puzzle pieces. For an Erdos-Renyi social network with n vertices and 
edge probability p n , we define the critical value p c (n) for a connected puz- 
zle graph to be the p n for which the chance of solving the puzzle equals 
1/2. We prove that for the n-cycle (ring) puzzle, p c (n) = 0(l/logn), and 
for an arbitrary connected puzzle graph with bounded maximum degree, 
Pc(n) = O(l/logn) and u}(l/n b ) for any b > 0. Surprisingly, with high 
probability, social networks with a power law degree distribution cannot 
solve any bounded-degree puzzle. This model suggests a mechanism for re- 
cent empirical claims that innovation increases with social density, and it 
begins to show what social networks stifle creativity and what networks 
collectively innovate. 

1. Introduction. Creativity and innovation can seem like magic [32]. But what unifies the way 
people innovate and solve complex problems is that they collectively combine ideas and solutions. In 
short, innovation and problem solving are often social endeavors. The idea of lone geniuses undergoing 
eureka moments is a myth; instead, most breakthroughs in science, art and business emerge from 
connected groups of people exchanging and merging ideas [29, 52, 15]. 

Consider first that successful creative works — from inventions [29] to Broadway musicals [52, 51] 
to software [23] — combine ideas from different disciplines. With new digital tools, creativity is even 
more collaborative [39] . One can collect and prioritize ideas from thousands of people [47] ; iteratively 
improve ideas using crowds [43]; and collaboratively invent and design home products, clothing and 
cars with thousands of people [37, 19]. 

Next consider problem solving. Finding creative solutions to urban planning [2] , software design [23] 
and open questions in science [53, 30] increasingly requires teams of people who combine diverse 
expertise and ideas. Again, digital tools empower large networks of people to collaboratively solve 
problems by combining ideas and partial solutions. Examples include addressing climate change [28], 
designing software [33], and collaborating using a wiki to prove mathematical conjectures [21]. With 
mild incentives and simple Internet tools, one can leverage social networks to collectively assemble 
solutions, such as finding objects in a large country [45], piecing together shredded paper [44], folding 
proteins [25], or collaboratively solving complex tasks [31]. 
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Companies increasingly recognize the importance of collaboration for innovation. Many companies 
connect their employees using internal social networks (such as IBM's Beehive [40]) and expertise 
location systems (such as Tacit [12]) to match compatible ideas and expertise. When they cannot 
find the right expertise internally, companies outsource their most difficult R&D problems to leverage 
knowledge worldwide, using services like Innocentive and Kaggle. 

Technology is not the only reason for the increase in collaboration. The consistent rise in scien- 
tific collaboration among universities, for instance, predates the Internet and has grown stably over 
time [53]. As knowledge and specialization grow, so does the need to collaborate (often across disci- 
plines) in order to advance science [53, 30] and other creative endeavors [52, 51]. 

Researchers have begun to study how the structure of social networks correlates empirically with 
their creative and team output, such as in Broadway musicals [52, 51], scientific breakthroughs [10, 
20, 34] and sports teams [17]. For instance, producers of the most successful Broadway musicals have 
an intermediate number of triangles in their collaboration network, to balance the creative input of 
new collaborators with the creative stagnation (yet greater familiarity) of past collaborators [52, 51]. 
In general, teams solve problems better than individuals do [26, 13, 38]. 

However, the dynamics occurring in creative social networks remains elusive. Researchers have noted 
the importance yet difficulty in modeling social creativity: "...little is known about how teamwork leads 
to greater creativity" [17]; "...discovery involves complex social and cognitive processes and may not 
be predictable in detail" [7]; "...traditional epidemiological approaches need to be refined to take 
knowledge heterogeneity into account and preserve the system's ability to promote creative processes 
of novel recombinations of ideas" [34] . Epidemic models of single ideas that spread like a slow, hard-to- 
catch disease in a social network [6, 8] do not model the space of ideas and how ideas recombine to form 
new knowledge [34]. Other models of scientific discovery and innovation include a branching process 
of new ideas mating with old ones [50] ; ant colony models of scientists seeking papers to cite like ants 
seeking food; information foraging theory, to model the costs and benefits of hunting for references in 
bibliographic habitats; the A-B-C model of finding triadic closure among ideas; bridging structural 
holes (gaps between dense communities of networks) in networks of people and ideas [11]. 

Although some of these frameworks model the space of ideas, none also model how people collaborate 
as they combine ideas with one another. A better understanding of idea combination may elucidate 
phase transitions in the creativity of social networks. Such phase transitions may explain why some 
communities in history (such as ancient Greece, Renaissance Florence and Elizabethan England) [3], 
companies (Apple, 3M, Google) [35] and cities (Silicon Valley, Tel Aviv) [9, 35] are so innovative and 
why other social networks lie in sub-optimal creative states. Our main results, Theorems 1 and 2, 
characterize such a phase transition. We find, roughly speaking, the required number of interactions 
among a group of people for them to collectively solve a large puzzle. 

1.1. Motivation for the jigsaw dynamic. Here we introduce a new kind of percolation on finite 
graphs that models a creative network of people merging compatible ideas into bigger and better 
ideas. The model is reminiscent of other models of percolation on graphs, such as bond percolation [24] 
and bootstrap percolation [27], but jigsaw percolation has more complex dynamics. 

Consider a social network of n people with vertex set V = {1, 2, . . . , n}, each of whom has a unique 
"partial idea" that could merge with one or more other partial ideas belonging to other people. These 
"partial ideas" can be thought of as pieces of a jigsaw puzzle: An idea is compatible with certain other 
ideas, just as a piece of a jigsaw puzzle can join with certain other puzzle pieces (in the correct solution 
of the puzzle) . Thus we use "ideas" and "puzzle pieces" interchangeably. The two networks are 

• the people graph (V, E peop i e ) , denoting who knows and communicates with whom; 

• the puzzle graph (V, E puzz \ c ), denoting which ideas are compatible and thus can merge to form a 
bigger, better idea. 

In this paper, we assume each person has a unique idea, so there are n ideas (puzzle pieces), and the 
system of people and their compatible ideas is a graph with two sets of edges, -©people and ©puzzle- 
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Allowing a person to have multiple ideas or multiple people to have the same idea requires two vertex 
sets, which we leave for future work (see Section 5). 

Next we propose a natural dynamic for people to merge their compatible ideas (puzzle pieces). If 
two people u,w know one another and have compatible puzzle pieces (i.e., uw € -Epoopic H E puzz \ e ), 
then they merge their puzzle pieces. After u, w merge their puzzle pieces, we say that u, w belong to 
the same jigsaw cluster U C V. The general rule is that two jigsaw clusters U, W merge if at least 
two people (one from each cluster) know one another and at least two people (one from each cluster) 
have compatible puzzle pieces. More precisely, we say that jigsaw clusters U, W are people-adjacent if 
uw G -Epeopie for some u € U,w G W. Similarly, U, W are puzzle-adjacent if u'w' G E puzz \ e for some 
v! G U,w' G W. Jigsaw clusters U, W merge if they are both people- adjacent and puzzle-adjacent. 

The motivation for this dynamic is the notion that after merging their ideas, a group of people can 
use any of those ideas to merge with the ideas of other people whom they know. We illustrate this in 
Figure 1. Here two nodes u,w in different jigsaw clusters U,W know one another (uw G -Epeopie)) but 
their puzzle pieces are incompatible (uw G" E puzz \ c ). However, u and w have merged their puzzle pieces 
with those of u' and w', respectively, and u' and w' do have compatible puzzle pieces (u'w' G E puzz \ e ). 
Thus u can tell w about her friend u , and w can tell u about his friend w . Then u' and w merge 
their compatible puzzle pieces, and the jigsaw clusters U and W merge. 




Fig 1: Illustration of the jigsaw dynamic. Dashed and solid edges denote the people graph and puzzle 
graph, respectively. Jigsaw clusters U and W contain three and four nodes each. Nodes u, w know one 
another but do not have compatible puzzle pieces. However, they have merged their puzzle pieces with 
nodes u',w', who do have compatible puzzle pieces. Thus U and W merge. 



1.2. Definition of jigsaw percolation. Formally, jigsaw percolation on (V, -Epeopie, -Epuzzic) proceeds 
in steps as follows. At every step % > 0, we have a collection Ci of disjoint subsets of V, where the 
elements of Ci, the jigsaw clusters, are labels on vertices that denote which puzzle pieces have merged 
by step i. 

1) Initially, Cq is the set of singletons {{v} : v G V}. 

2) After the first step, C\ is the set of connected components in the graph (V, E peop \ e n -E pU zzie)- 

3) After step i > 1, we have a collection of jigsaw clusters Ci that partition the set of vertices V. At 
step (i + 1), we merge every pair of jigsaw clusters in Ci that are both puzzle- and people-adjacent 
(see Figure 2). 

Note that three or more jigsaw clusters can merge simultaneously, as illustrated in Figure 2. 

It is useful to write jigsaw percolation as a dynamical system as follows. At step i, let Si be the 
unordered pairs of clusters in Ci that are people-adjacent and puzzle-adjacent. Then the jigsaw clusters 
in C,+i are the connected components of the graph (Ci, Si): 

Ci+i = {Ujj£aU : A is a connected component of (Cj,£j)}. (1-1) 

Given (V, E pcop i , E puzz \ c ), we merge jigsaw clusters until no more merges can be made, i.e., iterate 
Eq. (1.1) to a fixed point Coo. After finitely many steps, no more merges can be made. We say that the 
people graph solves the puzzle if all nodes belong to the same jigsaw cluster at the end of the process 
(i.e., Coo = {V})- Figure 3 illustrates a people graph that fails to solve a 2 x 2 puzzle. 
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Fig 2: Jigsaw clusters U±, U2, 1/3,1/4, U5 £ Ct at stage i. At stage i + 1, jigsaw clusters U\, U2, U3 merge. 



An equivalent definition of the process that is elegant and simple to code on the computer is to 
iteratively contract nodes that are adjacent in -E pe0 pie H E puzz \ e until no more contractions are possible. 
The people graph solves the puzzle if this procedure ends with a single node. 



step step 1 step 2 




Fig 3: A complete trajectory of the jigsaw dynamics. The people graph (dashed edges) does not solve 
this 2x2 puzzle. 



1.3. Statement of results. In this paper, we consider people graphs that are Erdos-Renyi random 
graphs Q(n,p n ), in which each possible edge appears independently with probability p n , with associated 
probability distribution P Pn . For a fixed, connected puzzle graph of size n, we are interested in the 
probability of the event 

Solve := {the people graph solves the puzzle} = {Coo = {^}}- 

We denote this probability by P (Solve) or by P Pri (Solve) to make explicit the value of p n . Note that 
the jigsaw dynamic is monotonic, in that adding more edges to the people graph or to the puzzle graph 
cannot decrease the chance of solving the puzzle. Thus, for fixed n, P p (Solve) is nondecreasing with p. 
Trivially, Po (Solve) = and Pi (Solve) = 1. Furthermore, P p (Solve) is a polynomial in p of degree 
at most Q). Thus for each n there exists a unique p £ (0, 1) such that P p (Solve) = 1/2, and we make 
the following definition. 

Definition 1. The critical value p c (n) for solving a connected puzzle is the unique value of 
p n £ (0, 1) such that P Pn (Solve) = 1/2. 

Remark 1. There is nothing special about the number 1/2. For our results, we could have taken 
any fixed positive real number strictly smaller than 1. However, the critical value p c {n) depends on the 
choice of the puzzle graph, which we suppress in the notation p c (n). 
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Remark 2. If the people graph is not connected, then the puzzle cannot be solved. Thus p c (n) > t n , 
where t n is the unique real number such that P (G(n,t n ) is connected) — 1/2. Asymptotically we have 
t n rj (logn — loglog2)/n (see [18]). Note that the equality p c (n) = t n holds when the puzzle graph is 
the star graph ({1, 2, . . . , n}, {(i, n) : 1 < i < n}), because in this case the puzzle can be solved iff the 
people graph is connected. 

We use the following standard notation for describing sequences of non-negative real numbers a n 
and b n : a n = 0(b n ) means there exists C > so that a n < Cb n for all sufficiently large n; a n = 0(&„) 
means a n = 0(b n ) and b n = 0(a„); a n — o(b n ) means a n /b n — > as n — > oo; and a n = uj(b n ) means 
b n = o(a n ). 

Our main results are the following two theorems. 

Theorem 1 (Ring puzzle). // the people graph is the Erdos-Renyi random graph and the puzzle 
graph is the n-cycle, then 

7^— < Pc(n) < ^— (1 + o(l)). 
27 log n 6 log n 

Moreover, for p n = A/logn, P Pri (Solve) — > or 1 according as A < 1/27 or A > 7r 2 /6. 

Remark 3. We believe that our upper bound is tight (see Section 5). We did not attempt to 
optimize the constant 1/27 in the lower bound; this value was chosen to make the proof easier to read. 
We do not think that our proof method will yield an optimal lower bound. 

Theorem 2 (Connected puzzle of bounded degree). For an Erdos-Renyi people graph solving a 
connected puzzle with bounded maximum degree, p c (ri) — (3(1/ log ro) and p c (n) = w(l/n b ) for any b > 
0. In particular, we have V Pn (Solve) — > for p n = 0(l/n b ) for any b > 0, and V Pn (Solve) — > 1 for 
Pn = A/logn with A > 7r 2 /6. 

Remark 4. The upper bound for p c (n) in Theorem 2 holds for any connected puzzle graph, even 
with maximum degree growing with n as n — > oo (see Proposition 1). The star graph example in 
Remark 2 provides a counterexample to the lower bound when the maximum degree is unbounded. 

Remark 5. The jigsaw dynamic is symmetric under swapping the people and puzzle graphs. Thus, 
Theorems 1 and 2 also apply to a ring and bounded- degree people graph (respectively) solving an Erdds- 
Renyi puzzle. 

As a model of social networks, the Erdos-Renyi random graph assumes no structure other than 
the average number of connections per person. However, in many social networks — from scientific 
citations [46] to scientific collaborations [4, 41, 42] to sexual partners [36] — some people have orders 
of magnitude more connections than others. The broad-scale degree distributions of such networks are 
well described by a power law (or by a power law with a cutoff) , in which a random vertex has degree 
k with probability proportional to k~ J for some 7 > 2. 

Surprisingly, such heterogeneous social networks cannot solve a large class of puzzles. We write the 
following corollary to Theorem 2 for a random power-law degree people graph, but it applies to any 
random people graph with a degree distribution that has finite (1 + e) th moment for some e > and 
that, given a degree sequence, is uniform over all simple graphs with that sequence. 

Corollary 1. For a random power-law degree people graph with exponent 7 > 2 and any con- 
nected, bounded- degree puzzle graph, the probability of solving the puzzle tends to as n — > 00. 

Remark 6. Because collaboration networks in science [4, 41, 42] manage to collectively solve 
puzzles despite their degree distributions being well modeled by power laws with exponential decay, 
more realistic assumptions, such as unbounded- degree puzzles, merit future work (see Section 5 for 
details). 
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Proof of Corollary 1. Let di be the degree of vertex i in the people graph. If f(n) —> oo as 
n — > oo, then 

P (^g* >/(„)„«,->) <^p-»o, 

because E<i^ -1 is finite. Choose < b < (7 — 2)/( / y— 1). By Theorem 2, if the people graph is Erdos- 
Renyi withp = n~ b , then the probability of solving a bounded-degree puzzle tends to as n — > 00. Since 
the minimum degree of this Erdos-Renyi people graph is bounded below by n 1 ~ b (l~o(l)) = u^n 1 ^ 7-1 )) 
with high probability, the Erdos-Renyi people graph stochastically dominates the power-law people 
graph in terms of edge density. By monotonicity of the event Solve, this implies that the power-law 
people graph cannot solve the puzzle with high probability. □ 

Some of the techniques in our proofs resemble those used for long range percolation and bootstrap 
percolation, but our arguments differ in key ways. In our proof of the lower bound on p c (n) for the 
ring puzzle graph, we show that a set of cut points, which must separate jigsaw clusters in the final 
configuration , exists with high probability for sufficiently small p. This is similar in spirit to finding 
a positive density of points over which no edge crosses to show that no infinite component exists in 
one-dimensional long range percolation [48, 14]. 

In our proof of the upper bound on p c {n), we use the fact that once a sufficiently large, solved 
cluster emerges, then that cluster will inevitably continue to merge and ultimately solve the puzzle. As 
in bootstrap percolation on the lattice graph [1, 27], our upper bound arises from a sufficient condition 
for the formation of a large cluster. 

Before proceeding, we note similarities between this model and several real world phenomena. First, 
it vaguely captures how people solve jigsaw puzzles: Dump the puzzle pieces on the table so that 
they land randomly, look for local connections, and then iteratively merge nearby clumps of merged 
puzzle pieces. Jigsaw percolation also resembles more consequential problems in several scientific fields. 
In protein assembly, peptides merge into meta-proteins, which further merge to form proteins [54]. 
Whether certain molecules merge together or tile the plane underlies the computations of many DNA 
sclf-asscmbly systems [49] . Finally, in creative social networks ranging from companies to communities 
of scientists, people combine their compatible ideas into bigger, better ideas, in a manner vaguely 
similar to jigsaw percolation. The phase transition in the probability that a random graph solves 
a jigsaw puzzle begins to inform what properties of these social networks facilitate their ability to 
collaboratively solve problems and to innovate. 

1.4. Road map for the paper. In Section 2, we prove the upper bound on the critical value p c (n) 
for both Theorems 1 and 2. In Section 3, we prove the lower bound for the ring puzzle in Theorem 1, 
and in Section 4 we prove the lower bound for arbitrary puzzles with bounded maximum degree. In 
Section 5, we discuss simulations and open questions. 

2. Upper bound on the critical value. In this section, we prove that the critical value has 
upper bound 7r 2 /(61ogn) for any connected puzzle graph. 

Proposition 1 (Upper bound for the critical value). For an Erdos-Renyi people graph and any 
connected puzzle graph on n vertices, if A > 7r 2 /6 and p n ~ A/log n, then 

lim F Pn (Solve) = 1. 

Remark 7. A close look at the proof of Proposition 1 reveals that the same conclusion is true as 
long as p n > 7r 2 /(61og?i) • (1 + clog log rt/logn) for some constant c £ (0,oo). 

For simplicity, one can look at the ring puzzle graph (the n-cycle), with 

£ puzzle = {(1, 2), (2, 3), .... (n - 1, n), (n, 1)}. 
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The idea of the proof is the following sufficient condition to solve the ring puzzle, illustrated in Figure 4. 
Suppose that in the people graph, node 2 is adjacent to node 1; node 3 is adjacent to 1 or 2; node 4 
is adjacent to 1, 2 or 3; and so on, so that node j is people-adjacent to at least one of {1, 2, . . . ,j — 1} 
for all 2 < j < n (as illustrated in Figure 4). Then the people graph solves the puzzle. 




5\ v 



puzzle 

people 

Fig 4: Illustration of the sufficient condition to solve the ring puzzle: j is people-adjacent to {1, 2, . . . , j — 
1} for all j = 2, 3, . . . , n. This event is contained in the event Solve. 



However, to obtain a good bound, we do not consider solving the whole puzzle in the manner 
depicted in Figure 4. Instead, we partition the puzzle graph into disjoint blocks and use the sufficient 
condition depicted in Figure 4 within each block. If the blocks are sufficiently large, then solving just 
one block suffices to solve the whole puzzle. We call a set U C V internally solved if the people graph 
induced on U can solve the puzzle graph induced on U. 

Lemma 1. Consider an Erdos-Renyi people graph and any connected puzzle graph on n vertices. 
Then for any fixed 5 > 0. 

P Pri (Solve | 3 an internally solved subset of size > (1 + <5)logn/e„) 
> 1 - n~ 5 , 

where e„ := - log(l - p n ). 

Proof. Suppose that there exists a internally solved subset U of size m > (1 + S)\ogn/e n . The 
probability that all the remaining n — m vertices in V \ U are connected to U by a people edge is 

(1 - (1 -p n ) m ) n - m > (1 - e - £ " m ) n > 1 - ne- £ " m > 1 - n~ 5 . 

Note that by connectivity of the puzzle graph and people graph, the event that all vertices in V \ U 
are connected to U by people edges implies Solve. The proof is now done, because the event that a 
particular set of vertices forms an internally solved subset depends only on the edges among those 
vertices. □ 

We use the following lemma to partition the puzzle graph into disjoint blocks. The motivation comes 
from analyzing the ring puzzle graph. 

Lemma 2. Let m > 1 be a fixed integer. For any connected graph G with vertex set V , there exists 
an integer k > \V\ /(2m) and subsets B%, B2, ■ ■ ■ ,Bk of V such that 

ii) \Bi\ € [m, 2m] for i = 1, 2, . . . , k — 1 and \Bk\ < 2m; 
Hi) the induced subgraph on Bi is connected for all i = 1,2, ... , k; 
iv) Bi and Bj share at most one vertex in common for all 1 < i < j < k. 
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Proof. The proof proceeds by induction on n := \V\. The lemma is obviously true for n < 2m, so 
let us assume that n > 2m + 1 . 

For any connected graph G of size n, fix a spanning tree T of G. Removing a single vertex vq from 
the tree T results in finitely many disjoint components G\, C2, . . . , Ck, each of which has a unique 
marked vertex adjacent to vq in T. We consider three disjoint cases. 

Case 1. If one of the components has size between [m, 2m], we define this component as B\ and 
use induction on the graph G with the vertex set B\ removed, which is still connected. 

Case 2. If all of the components have size < m, define I as the smallest integer such that 

\Ci\ + \C 2 \ H h \Q-i\ < m and | d | + |C 2 | H h \Q\ > m. Such an I exists, because \d\ + |C 2 | + 

• • • + |Cfc| = n — 1 > m. Necessarily we have |C X ] + |C 2 ] H + |Cj| < 2m, because |Cj| < m for all i. 

We take B\ := U l i=1 Ci U {vo} and use induction on the graph G with vertex set u' =1 C; removed (note 
that vq will appear in more than one subset, because it has not yet been removed from G). 

Case 3. If none of the components has size between [m, 2m] and at least one component has 
size > 2m, we choose one such component (and ignore the other components), call it Vi, and remove 
the marked vertex vi from it. Removing v\ creates several new components, each containing a marked 
vertex adjacent to v\ in T. We repeat this procedure until reaching the following situation: The size 
of Vfc is > 2m, but if we remove the marked vertex Vk from it, then all the resulting components have 
size < 2m. If one of them has size more than m, then we take that component as Bi, and we continue 
by induction with the rest of the tree, which is connected by construction. If all of the components 
have size < m, we follow the steps in Case 2 to define B\ and continue by induction. 

To complete the proof we need to check properties iii) and iv) for each block Bi, which follow easily 
from the spanning tree and marked vertex construction. □ 

Proof of Proposition 1. Using Lemma 2, we partition the puzzle graph into blocks B\, B 2 , ■ • • , Bk 
of size < 2m (where m is determined later) with \Bi\ > m for all i < k. Note that k > n/(2m). Let Bi 
be the event that block Bi is solved using only people edges in block Bi. Let S := J2t=i 1 b 1 be the 
number of blocks (excluding the last block Bk) that are solved using people edges only within each 
block (i.e., internally solved). The events Bi are independent because the blocks use disjoint sets of 
edges, and they are Bernoulli random variables with mean P (Bi). 

Next we show that if p — A/ log n with A > 7r 2 /6, then 

F(S > 1) 1 as n-> 00. 

Consider the subgraph of the puzzle graph induced by Bi. We can fix a rooted spanning tree and 
label the vertices with integers 1, 2, . . . , \Bi\ in such a way that the vertex with label j is puzzle-adjacent 
to the set of vertices with labels {1, 2, . . . , j — 1} in the spanning tree for all j > 1. As illustrated in 
Figure 4, a sufficient condition for the event Bi to occur is the event 

Bi :={for all 1 < j < |2?i|,the vertex labeled j is people-adjacent 
to the set of vertices labeled {1, 2, . . . , j — 1}} C Bi. 

(Note that there could be other ways to solve the puzzle. For example, in the case of a ring puzzle, 
j is people-adjacent to j + 1, and j + 1 (but not j) is people- adjacent to {1, . . . , j — 1}. Thus B\ is 
not a necessary condition for B\ to occur, i.e., B\ C B\.) The events that j + 1 is people-adjacent to 
{1,2,..., j} occur independently with probability > 1 — (1 — p n ) 3 , so 



\Bi\-l 2m 
F(B7)> J] (l-(l-Pn) 3 ) >n (!-(!- Pn)') 
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Thus the random variable S stochastically dominates 

2m 

S' ~ Binomial(fc - 1, (l - (1- Pn ) j )). 
i=i 

For neN, let e n := — log(l — p n ), so that 1 — p n = e~ e ™. We use the next lemma to obtain a lower 
bound on 

2m 

logES" = log(fc - 1) + lo § C 1 _ e ~ jtn ) ■ 

3=1 

The proof of Lemma 3 follows the present proof. 

Lemma 3. Let 9{x) := — JTlog(l — e _t ) dt for x £ [0, oo]. Iflim t ^ m t e = x £ [0, oo], then 

lime^log(l-e- l£ ) = -6{x). 

i=l 

Moreover, for all m > 1 and e > 0, 

m 2 

£]og(i- e - fc ) + ZL 



6e 

i=i 



<-log + - . (2.1) 



Fix S £ (0,n/ [logn/e„] — 1), and let m = [(1 + S) (log n) /e n ] . Using Lemma 3, we estimate 
logES' 

2 / 2m 2 

'^(i-O-ij^gMi-e--)^ 

/ 7T 2 \ , , 2m 1 , 2e 2 
> 1 — — I log n — log — log ■ 



6\J b b 1 - 2m/n 2 b e 6ee 2me 
tt 2 \ m 5, , 1, 8e 2 (l + <5) 2 tt 2 logn 



" ^ 6A J l0g " - - 2 l0gl ° g " " 2 bg 6A n 2 + 2 * 

—¥ oo as n — > oo. 

Since 5' is binomial, E 5' -> oo implies that P (S' > 1) ->■ 1. Thus we have P (£ > 1) > P (5' > 1) ->• 
1. On the event {S > 1}, the whole ring is solved with probability tending to one by Lemma 1 and 
the fact that m > (1 + <5)(logn)/e„. The proof is complete. □ 

Proof of Lemma 3. Note that 

A * ~ e -<i« ~i_ e -i*« 

-e]Tlog(l-e = — = e E ^=1) 

i=l i=l 3=1 J 3=1 J v ' 

_ 1 - e- J ' fee y> (1 - e^' C£ )(e J ' e - 1 - je) 

~ hi hi f^- 1 ) ' 

Using the power series expression of e x , it is easy to see that (e x — 1 — x)/(e x — 1) < min{a;/2, 1}. 
Applying the last inequality, we have 

- (i_ e -j^)(^_i_j £ ) ^ ~ min{je/2, 1} 



i 2 (e^-l) f-f J 2 

^E^ + Ei^f ( lo § m + !) + h = f log 



2^ 

2j ' ^ j 2 ~ 2'"°"" ' ~ 7 ' m 2"° e 
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using m = 2/e. Thus, combining the last two displays, 



£elog(l-e-«)+£ 



1 - e~ jke 



e, 2e 2 



i=l 3=1 

In particular, if lim^g fc e e = a; G [0, oo], then interchanging the sum and the integral 

Ay 



lime^log(l-e-«) = -^ 



1 - e^ x 



V - / e^' 4 di = / log(l - e _t ) dt, 
• =1 J Jo Jo 



(2.2) 



which completes the proof. The bound (2.1) follows from (2.2) and the fact that e jfc£ < e ke for all 
j > 1. □ 

3. Lower bound for the ring puzzle. In this section, we prove a matching-order lower bound 
for an Erdos-Renyi people graph solving the ring puzzle. The idea of the proof is to show the existence 
of a cut set that divides the ring into pieces that never merge. 



Proposition 2. For the ring puzzle graph, if A < 1/27 and p n = A/logra, then F Pn (Solve) — > 0. 
Therefore p c {n) > 1/(27 log n). 

Proof of Proposition 2. Let x be a fixed positive integer to be chosen later [it will be 9(logn)]. 
We will identify the vertices in the ring puzzle graph (V, E puzz i c ) with elements from Z„, so that two 
vertices it, v € Z„ are neighbors iff u — v — ±1, where all additions and subtractions in Z n are modulo 
n. We denote the interval {a, a + 1, . . . , b} C Z n by [a, b] and its length by \[a, b]\. 

Given an interval I — [a, 6] C Z„, we call it x-good if there is a vertex u E I such that u is not 
people- adjacent to any vertex in the interval [a — x, b + x] . We call the vertex ue/an x-good vertex 
in /. The proof hinges on the following observation. Loosely speaking, if throughout the puzzle there 
are people unacquainted with anyone in a sufficiently large neighborhood of the puzzle, then these 
people obstruct the growing solution, and the social network cannot solve the puzzle. 

Lemma 4. Suppose that there exist integers = ao < a± < • • • < = n such that, for all 
j = 0, 1, . . . , k — 1, the interval Ij := [a,j + 1, Oj+i] is x-good and has length \Ij\ < x. Then the puzzle 
cannot be solved. 



Proof. Let Vj € Ij be an x-good vertex in Ij for j = 0, 1, . . . , k — 1. Clearly 1 < vq < v\ < ■ ■ ■ < 
Vk-i < n. Furthermore, each Vj has no people edges with [vj—i, Vj+\] (where j +£ is taken modulo k), 
because \Ij \ < x. 

Suppose for contradiction that the puzzle can be solved. Then there must exist a first stage, i, after 
which there exists an index j such that two distinct vertices, u € [vj,Vj + i] and v G [wj +1 ,«j +2 ], belong 
to the same cluster in Ci. One of these vertices must be Vj+\ (without loss of generality, u = fj+i), 
because otherwise Vj+i would have to belong to a larger cluster in Cj_i, and therefore Vj+i would have 
merged at an earlier stage of the process, which is a contradiction. Since i>j+i is not people-adjacent 
to any other vertices in [t^+i, Vj+2\, v must be in a component in C^_i that contains vertices outside 
of [vj + \, Uj+2]) but this is also a contradiction. Thus the puzzle cannot be solved. □ 

In light of Lemma 4, to complete the proof we need to show the existence of such intervals with 
probability tending to 1. Suppose n > x 2 . Define k := \n/(x — 1)J < n. Define 

hi := x for 1 < i < n — k{x — 1), li :— x — 1 for n — k{x — 1) < i < k, 
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and a, := l\ + I2 H — ■ + 1% for i = 0, 1, . . . , k. Clearly all the intervals Ii := [aj + 1, a i+1 ], < i < k — 1 
arc of length x — 1 or x. Let Z be the number of intervals that are not x-good, 

fc-i 

Z := l{tho interval 7; is NOT z-good} ■ 

i=0 

It suffices to show that P (Z > 0) — > as n — > oo for appropriate choice of x. We will use Lemma 5 to 
estimate the probability that an interval is not x-good. 

Lemma 5. Fix an integer x > 1. Let I be an interval of length Ix for some number I > 0. Suppose 
that t :— px £ (0, 1/(1 + 2)). Then we have 

P (I is NOT x-good) 
t 



< exp 



2p 



(21 log( A /l + l/t - 1) + {I 2 + 4Z + 2)t - 2t-y/l+J/t - 21 log I - I) 



In our case, all intervals are of length x — 1 or x, so I £ [1 — 1 1] ■ If we suppose that t :— px < 1/3, 
then 



' (Z > 0) < E (Z) < n exp 



t 

~2p 



(2 log( v/TTiA - 1) + 7t - 2t v / l + l/t - l) 



In particular, if p — p n — A/logn and x = t\ogn/X for some t < 1/3, we have 

t log n 



' (Z > 0) < exp 



log 71 



2A 



■ (2 io g (vTTiA - 1) + 7t - 2t vTTTa - 1) 



— > as n — > oo 



when 



A < 



t 



2 log( Vl + l/t - 1) + 7* - 2/yi + l/< - 1 



One can easily check that (by taking t = 0.07) 

t r 



te(o,i/3) z 



21og(>/l + l/i- 1) +7*- 2t./i + l/t - 1 > 1/27. 



(3.1) 



Thus given A < 1/27, we can choose t 6 (0,1/3) such that (3.1) holds, and taking a; = tlogn/A we 
have 

P ^2 ^l" 10 interval /; is NOT x-good} > 0^ ^ as 71 ^ OO. 



This completes the proof. 



□ 



Proof of Lemma 5. Without loss of generality, suppose that the interval / is [1,/x]. Recall that 
/ is x-good if there is a vertex u £ I such that u has no people edges with I x :— [1 — x, Ix + x]. Thus 
/ is not x-good implies that all vertices in / have at least one people edge with I x , in other words 
Sie/a !{« has a people edge with j} > 1 for all i £ /, and thus 

^ , ^ . l{i has a people edge with j} — ^* 
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The number of distinct pairs of vertices between / and I x \ I is 2lx 2 , and the number of distinct pairs 
of vertices within / is ) • Therefore 

^ ^ ^ ^ -^{i has a people edge with j} ^ ~t~ 

where X ~ Bin(2/x 2 ,p), Y ~ Bin^^),??), and X, F are independent. In particular, we have 

P (I is not x-good) < P (X + 2Y > Ix) 

<V(X + 2Y' > Ix) < e- elx E (e ex+26Y ') 

for any 9 > 0, where Y' ~ Bin(Z 2 a; 2 /2,p) is independent of X. We have 

P (X + 2Y' > Ix) < e- elx (l -p + pe e ) 2lx ' 2 (l-p + pe 26 f x2/2 

< exp[-Za;((9 - 2t(e e - 1) - lt(e 2e - l)/2)], (3.2) 

where t := px. Note that we have 

E{X + 2Y,) =(l + 2 )p x = (l + 2)t. 
Ix 

Hence, under the assumption t G (0, 1/(7 + 2)), we have > E (Jf + 2Y') and y/T+JJi— 1 > / . Taking 
= log[(^/l + l/t - 1)/Z] in (3.2), we finally have 

P (/ is not x-good) 

< exp - — (21 log(v / l + l/t - 1) + {I 2 +41 + 2)t - 2tJ\ + l/t - 21 log Z - l) 
2p 

This completes the proof. □ 



Propositions 1 and 2 give Theorem 1. 



4. Lower bound for puzzles with bounded degree. In this section, we prove the lower bound 
in Theorem 2 for arbitrary puzzle graphs with bounded degree as n — > oo. 

Proposition 3. For any sequence of connected puzzle graphs with bounded maximum degree as 
\V\ = n oo, p c (n) = co(l/n b ) for any b > 0. 

Proof.. Let p = n~ b such that k > 2 and b € (r, tzj) are fixed, and suppose that the maximum 
degree of (V, E puzz i c ) is at most D for all n. After stage i we have a collection of jigsaw clusters Ci. 
Initially Co = {{v} : v € V}, and after the first stage C\ is the set of connected components in the 
graph (V, -Epeopio H -Epuzzie)- Thereafter, two clusters U,U' G C; merge if there is an edge between 
the two clusters in -E peop i e and an edge between the two clusters in -E puzz i . Therefore, if U, U' £ Ci, 
then U,U' C W € C^+i if and only if there is some nonnegative integer I and a sequence of clusters 
U = Uq,Ux, . . . ,U(. = U' G Ci such that ?7j merges with Uj + \ at stage i + 1. 

Observe that for i > 1, every merge event in stage i + 1 must involve at least one cluster that was 
formed by a merge in stage i. Inspired by this observation, we let Ai C Ci be the set of active clusters 
that were the result of at least one merge in stage i when i > 1, and let Aq = Cq. Next we define the 
events Ei and Fi for i = 0, . . . , k as 



i*i = {max{|W| : W G Cj} > Zj} , 
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where Cj and L t are constants that depend on d and k, which we will define later. In words, Ei is 
the event that there are at least Cin}^ lb active clusters following stage i, which is contained in the 
event that at least Cin 1 ~ lb merges occur at stage i, because each active cluster must be the result of 
at least one merge. Fi is the event that the largest cluster following stage i has at least Lj vertices. 
For sufficiently large n, the event Ek is equivalent to the event that at least one merge occurs at stage 
k, because kb > 1. Therefore, our goal is to show that P(-Efc) — > and P(-Ffc) — > as n — > oo, which 
implies that no merges occur after stage k and that the largest cluster has size at most Lk, so the 
puzzle remains unsolved. 

Our strategy is to prove this by induction on i. It is trivially true that P (Eq) = and P (Fq) = 
with Co — 2 and L = 2. Now, let us assume that P (E{) — > and P (Fi) — > as n — > oo for some 
i £ {0, 1, . . . , k - 1}, which implies that P (Ef n F? ) -> 1. On the event n i^ c , we know that the 
number of active clusters is \Ai\ < C i n 1 ~ lb , and the largest cluster has at most Li vertices. The latter 
implies that every cluster has fewer than DLi neighboring clusters in (V, -E puzz ie), because each vertex 
has at most D total neighboring vertices in the puzzle graph. We will use this fact in two ways. First, 
we will show that the number of merges at stage i + 1 is small, because each active cluster after stage 
i has relatively few opportunities to merge. Second, we will show that no path of neighboring clusters 
longer than length k — i merge at stage i + 1, because few such paths exist. 

To meet our first goal, we define a random variable I^b} ^ or eacn P a i r °f an active cluster A £ Ai 
and a neighboring cluster B £ Ci such that B ^ A and there is an edge in -E puZ zic between A and B. 
The random variable I{a b} 1S tne indicator of the event that A and B merge at stage i + On the 
event Ff, the probability that A merges with B is at most 



1 



(1 - n- b ) (DU? < 1 - e -^ DL ^ n ~" < 2(DLifn- b , (4.1) 



where we assume that n is large so that nT b < and we use the fact that e~ 2x < 1 — x for 
x e (0,1/2). For convenience, we now order the clusters in d so that A\, A 2 , . . . , A\j^.\ e Ai and 
■^|^ 4 ^4|-4i|+2j • • • ) A\d\ € Ci \ Ai- Therefore, on Ef n F[, the total number of merges that occur in 
stage i + 



1-4.1 \Ci\ 
j=l l=j+l 



ji+1 



is stochastically dominated by Xi ~ Bmomial(D L i Cin 1 ^ lb , 2(DLi) 2 n~ b ). This is because there are at 
most DL i Cin 1 ~ lb distinct pairs of neighboring clusters, at least one of which is active, and the events 
that each of these pairs merges at stage i + 1 are independent, because they depend on disjoint sets of 
edges in the people graph. If we let C l+1 = 4(£> J L i ) 3 C 4 (this is 2EX t /n 1 - ( - l+ ^ b ), then by Chebychev's 
inequality 

(\M \Ct\ 

¥ {E i+1 \E9 n F?) = P £ i\X,A t} > Ci+m 1 -^ El n Ff 
\j=i e=j+i 

< ¥ (Xi > Ci+w 1 -^ 1 ^ 
= ¥{X i -¥*X l >EI t ) 

< (EI,) -1 = 0{n- 1+{l+ ^ b ) -> 0. 

Since P (Ef D F t c ) -> 1, we have that P (E i+1 ) -> 0. 

Next we must show that the largest cluster after stage i+1 has size at most L i+ \. Define a cluster path 
of length I > between U, U' G Ci to be a sequence of distinct clusters U — Uo,U\, . . . ,Ue — U' G Ci 
such that Uj and Uj+i are puzzle-adjacent for all j € {0, ...,£— 1}. For a fixed cluster A £ Ci, let Y\ 
denote the number of paths of length k that start at A (so Uq — A) and such that Uj will merge with 
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Uj + i at stage i + 1 for each j € {0, . . . , k — 1}. For any cluster path Uq, . . . , the probability that Uj 
and C/j+i merge at stage i + 1 is bounded above by 2(DLi) 2 n~~ b on the event Ff, by inequality (4.1). 
The number of cluster paths of length k in Ci that start at A is bounded by (DLi) k on F[, because 
each cluster has at most DLi neighboring clusters. Therefore, by Markov's inequality 



\Aec z 



Ff) <nV{Y\>l\FZ 



< n 



{DLi) k {2(DL l fn- b ) k ] = 0(n 1 - kb ) -> 



This implies that there are no cluster paths of length k or longer that merge at stage i + 1. In turn, 
this implies that the largest cluster after stage i + 1 is smaller than Lj+i := L k with high probability 
(Fi + i) — > 0, which completes the proof. □ 



on the event F[, so 



Propositions 1 and 3 give Theorem 2. 



5. Discussion and future directions. In our early attempts to understand jigsaw percolation 
on the ring graph, we tried to use simulations to inform our conjectures about the critical value p c (n) 
(Figure 5a). However, as with bootstrap percolation [22], we expect a slow rate of convergence to the 
critical value. 



Conjecture 1. For jigsaw percolation on the ring puzzle graph with an Erdos-Renyi people graph, 
there exist constants b > 0, c\ > and ci such that 

VM = 7^- + n \ +b + o ((log n)-^ b ) . 
logrt (log n) 1+0 

If true, this means that estimating c\ to within 1% via simulation would require taking n to be at 
least exp[(100c2/ci) 1 / b ], which is prohibitively large if |c2 /ci | is much larger than 0.1 and b is at most 
1. However, we expect our upper bound on p c (n) to be tight for the ring graph. 

Conjecture 2. For jigsaw percolation on the ring puzzle graph, C\ = 7r 2 /6. 

This conjecture is based on a computation (not shown here) that implies that a two-sided growth 
version of the sufficient condition used in the proof of Proposition 1 (i.e., the one-sided requirement 
that j is connected to {1,2, ... ,j — 1} for each j) yields the same upper bound of 7r 2 /(6 log n) but 
with a correction of order (logn) -3 / 2 . Of course, even when the two-sided growth process fails starting 
from every vertex, it may still be possible to solve the puzzle by merging the clusters formed. However, 
if none of these "two-sided growth clusters" intersect, then the puzzle is unlikely to be solved, so we 
suspect that c\ = ir 2 /6 is the correct lower bound. 

Of particular interest for future study, the number of steps until the process stops measures how 
efficiently the network solves the puzzle or determines that it cannot be solved. We numerically sim- 
ulated the average number of steps until the process terminates for the ring puzzle (Figure 5b). As 
expected, the number of steps increases around the phase transition p c (n). The process terminates 
quickly when the puzzle is not solved, and Lemma 1 implies that the number of steps is at most 
O (log n/p n ), though this is not the best bound possible. The proof of Proposition 2 shows that for the 
ring puzzle with p n < 1/(27 log n), the largest jigsaw cluster (and hence number of steps) is smaller 
than logn. As p n increases near p c (n), the puzzle may be solved, but just barely, so the number of 
steps required is largest. As p n increases further, more people-edges leads to larger clusters early in 
the process. Determining the form of the function in Figure 5b is an interesting open problem. 
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P (Solve) 




(a) Fraction of trials in which the people graph solves 



the n = 1000 ring puzzle (b) Average number of steps before the process stops 

Fig 5: Simulations of jigsaw percolation on a ring of size n = 1000, with 200 trials for 21 equally 
spaced values of p e [0,1.05 x 7r 2 /(61ogn)] (which took 57 days on a department server). Dots are 
averages of 200 trials, while shaded gray areas denote ±1 standard deviation. The estimated critical 
value p° st ss 0.11, denoted in red, is obtained by fitting a line between the two data points with 
P p (Solve) just below and above 1/2. Characterizing the average number of time steps before the 
process terminates (Fig. 5b) remains an open question. 

Open Problem 1. For the ring puzzle, let N n be the smallest value of i such that Ci = C%+\. 
Determine the asymptotic behaviors of 

E Pii [iV n |Solve c ] and E Pn [N n | Solve] 

as functions of p n . 

Finally, we suspect that the phase transition at p c (n) is sharp, in the following sense. 

Conjecture 3. Define p e (n) as the unique p for which ¥ p (Solve) = e. Then 

p e (n)/pi- e (n) -> 1 

as n — y oo for any e € (0, 1) fixed. 

Other avenues of future study include extensions and modifications of jigsaw percolation. Different 
people and puzzle graphs (especially ones with unbounded degree) are one natural direction, with 
mathematical and practical interest. 

Open Problem 2. Consider other people and puzzle graphs, especially puzzles with unbounded 
degree. 

Another natural direction is to modify the model to make it more realistic. For example, by analogy 
with the "adjacent-edge" modification of explosive percolation [16], in the "adjacent-edge" (AE) version 
of jigsaw percolation, the rule for merging two clusters U and W requires that the people- and puzzle- 
edges between U and W coincide on at least one vertex. That is, in the AE rule, two jigsaw clusters U 
and W merge only if there exist u £ U and w, w' € W such that (it, w) € -Epuzzie and (it, w') £ E peop i e . 
In this version, a single person must determine whether her friends' jigsaw clusters fit with her piece of 
the puzzle, but she does not need to be aware of how her entire jigsaw cluster fits with the clusters of 
her acquaintances. This process is slightly more local, so we suspect that more detailed, rigorous results 
are possible. Note that all of our results for jigsaw percolation also hold for AE jigsaw percolation. 
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Open Problem 3. Does the behavior of AE jigsaw percolation differ significantly from that of 
jigsaw percolation for some class of puzzle graphs? Can more precise statements be made about the 
behavior of AE jigsaw percolation on the ring graph ? 

Another potentially interesting modification is to change the map from people to puzzle pieces so 
that it is no longer bijective. This would allow many people to have the same idea and a single person 
to have multiple ideas. 

Open Problem 4. What is the effect of changing the map between people and puzzle pieces on a 
network's ability to solve the puzzle? 

In this paper, each person has one unique puzzle piece (or idea). The critical value p c (n) marks the 
phase transition in the connectivity of the Erdos-Renyi people graph at which it begins to solve the 
puzzle with high probability. For a large class of puzzle graphs (n-cyles in Theorem 1, bounded-degree 
puzzles in Theorem 2), we show that this phase transition decreases with n. However, the critical 
average degree, np c (n), increases with the size n of the social network and of the puzzle. Thus, as social 
networks and the puzzles they try to solve grow commensurately in size, people must interact with 
more people in order to realize enough compatible, partial solutions. This model therefore suggests 
a mechanism for the recent statistical claims that as cities become more dense, people interact more 
and hence innovate more [5, 9]. Furthermore, most social networks wish to minimize communication 
overhead; the critical value p c (n) indicates the minimal communication needed to coallaboratively solve 
large puzzles. 

Surprisingly, social networks with power law degree distributions lack the connectivity needed to 
solve bounded-degree puzzles (Corollary 1). However, scientific collaboration networks manage to solve 
puzzles despite their heavy-tailed degree distributions [4, 41, 42]. This highlights the importance of 
considering more realistic assumptions in the model and of drawing from (still nascent) studies on 
knowledge spaces [11]. 

This work, the first step in analyzing a rich, mathematical model, begins to suggest why certain 
social networks stifle creativity and why others innovate. With a homogeneous degree distribution and 
sufficiently many interactions, a social network can collectively merge the pieces of a large puzzle — and 
perhaps merge the ideas that lead to a great idea. 

Acknowledgements. We thank Rick Durrett, M. Puck Rombach, Peter J. Mucha, Raissa M. 
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